Japanese Dialogue Corpus of Multi-Level Annotation

نویسنده

  • Shu Nakazato
چکیده

This paper describes a Japanese dialogue corpus annotated with multi-level information built by the Japanese Discourse Research Initiative, Japanese Society for Artificial Intelligence. The annotation information consists of speech, transcription delimited by slash units, prosodic, part of speech, dialogue acts and dialogue segmentation. In the project, we used the corpus for obtaining new findings by examining the relationship between linguistic information and dialogue acts, that between prosodic information and dialogue segment, and the characteristics of agreement/disagreement expressions and non-sentence elements. 1 I n t r o d u c t i o n This paper describes a Japanese dialogue corpus annotated with multi-level information such as speech, linguistic and discourse information built by the Japanese Discourse Research Initiative, supported by Japanese Society for Artificial Intelligence. Dialogue corpora are now indispensable to speech and language research communities. • The corpora have been used not only for examining the relationship between speech and linguistic phenomena, but also for building • speech and language understanding systems. Sharing corpora among researchers is most desirable since creating the corpora needs considerable cost like writing and revising annotation manuals, annotating the data, and checking the consistency and reliability of the annotated data. Discourse Research Initiative was set up in March of 1996 by US, European, and Japanese researchers to develop standardized discourse annotation schemes (Carletta et al., 1997; Core et al., 1998). The efforts of the initiative have been called 'standardization', but this naming is misleading at least. In typical standardizing efforts, as done in audio-visual and telecommunication technologies, commercial companies try to expand the market for their products or interfaces by the standard. The objective of standardizing efforts in discourse is to promote interactions among discourse researchers and thereby provide a solid foundation for corpus-based discourse research, dispensing with duplicating resource making efforts and increasing sharable resources. In cooperation with this initiative, Japanese Discourse Research Initiative has started in Japan in May 1996, supported by Japanese Society for Artificial Intelligence (JDRI, 1996; Ichikawa et al., 1999). The activities of the initiative involve: creating and revising annotation schemes based on the survey of the existing schemes and annotation experiments, annotating corpora based on the proposed annotation schemes, and doing research using the corpora not only for examining the utility of the schemes and corpora but also for obtaining new findings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus

The LUNA corpus is a multi-lingual, multidomain spoken dialogue corpus currently under development that will be used to develop a robust natural spoken language understanding toolkit for multilingual dialogue services. The LUNA corpus will be annotated at multiple levels to include annotations of syntactic, semantic, and discourse information; specialized annotation tools will be used for the a...

متن کامل

Co-reference annotation and resources: A multilingual corpus of typologically diverse languages

This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Exa...

متن کامل

A corpus for studying addressing behavior in multi-party dialogues

This paper describes a multi-modal corpus of hand-annotated meeting dialogues that was designed for studying addressing behavior in face-to-face conversations. The corpus contains annotated dialogue acts, addressees, adjacency pairs and gaze direction. First, we describe the corpus design where we present the annotation schema, annotation tools and annotation process itself. Then, we analyze th...

متن کامل

Evaluation of Transcription and Annotation Tools for a Multi-modal, Multi-party Dialogue Corpus

This paper reviews nine available transcription and annotation tools, considering in particular the special difficulties arising from transcribing and annotating multi-party, multi-modal dialogue. Tools are evaluated as to the ability to support the user’s annotation scheme, ability to visualize the form of the data, compatibility with other tools, flexibility of data representation, and genera...

متن کامل

Creating spoken dialogue characters from corpora without annotations

Virtual humans are being used in a number of applications, including simulation-based training, multi-player games, and museum kiosks. Natural language dialogue capabilities are an essential part of their human-like persona. These dialogue systems have a goal of being believable and generally have to operate within the bounds of their restricted domains. Most dialogue systems operate on a dialo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000